Demographic Word Embeddings for Racism Detection on Twitter
نویسندگان
چکیده
Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3%) and significantly improves over the classification performance of demographic-agnostic models.
منابع مشابه
Using Convolutional Neural Networks to Classify Hate-Speech
The paper introduces a deep learningbased Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word ve...
متن کاملLearning Multiview Embeddings of Twitter Users
Low-dimensional vector representations are widely used as stand-ins for the text of words, sentences, and entire documents. These embeddings are used to identify similar words or make predictions about documents. In this work, we consider embeddings for social media users and demonstrate that these can be used to identify users who behave similarly or to predict attributes of users. In order to...
متن کاملTwitter Author Profiling Using Word Embeddings and Logistic Regression
The general goal of the author profiling task is to determine various social and demographic aspects of the author based on his pieces of writing. In this work, we propose an approach that combines word embeddings and classical logistic regression for identifying author gender and language variety based on the corresponding tweets. The model was trained on PAN 2017 Twitter Corpus that contains ...
متن کاملMining Adverse Drug Reaction Mentions in Twitter with Word Embeddings
This paper describes our system used in the PSB 2016 Workshop on Social Mining Shared Task for adverse drug reaction (ADR) extraction in Twitter. Our system uses Conditional Random Fields to train a classifier for extracting ADR mentions. We leverage word representations from large amount of unlabeled tweets, both drug related and generic. Our experiment results show that cluster features deriv...
متن کاملImproving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings
It has been shown that learning distributed word representations is highly useful for Twitter sentiment classification. Most existing models rely on a single distributed representation for each word. This is problematic for sentiment classification because words are often polysemous and each word can contain different sentiment polarities under different topics. We address this issue by learnin...
متن کامل